Adding multi file prediction with Array Job#1
Adding multi file prediction with Array Job#1Thomasbush9 wants to merge 1 commit intoKempnerInstitute:mainfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for multi-file protein folding predictions using SLURM array jobs. The enhancement allows batch processing of multiple FASTA files by automatically splitting them into chunks and submitting parallel prediction jobs.
- Introduces a bash script to split input files into chunks and submit array jobs
- Adds a SLURM array job script for processing individual file chunks
- Updates documentation to explain the new multi-file prediction workflow
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| split_and_pred.sh | Main orchestration script that splits input files into chunks and submits SLURM array jobs |
| single_prediction_array.slrm | SLURM array job script that processes individual file chunks through the protein folding pipeline |
| README.md | Updated documentation explaining the new multi-file prediction workflow and usage instructions |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| # Usage: ./split_and_submit.sh INPUT_DIR N OUTPUT_PARENT_DIR | ||
| # Example: ./split_and_submit.sh /data/images 5 /data/jobs |
There was a problem hiding this comment.
The usage comment references 'split_and_submit.sh' but the actual filename is 'split_and_pred.sh'. Additionally, the example uses 'images' directory which is misleading for a protein folding workflow that expects FASTA files.
| # Usage: ./split_and_submit.sh INPUT_DIR N OUTPUT_PARENT_DIR | |
| # Example: ./split_and_submit.sh /data/images 5 /data/jobs | |
| # Usage: ./split_and_pred.sh INPUT_DIR N OUTPUT_PARENT_DIR | |
| # Example: ./split_and_pred.sh /data/fasta_files 5 /data/jobs |
| #SBATCH --mail-user=thomasbush52@gmail.com | ||
| # Use array-aware log names to avoid clobbering: | ||
| #SBATCH --output=/n/home06/tbush/job_logs/%x.%A_%a.out | ||
|
|
||
|
|
||
| set -euo pipefail | ||
|
|
There was a problem hiding this comment.
Hardcoded email address should not be committed to the repository. This should be removed or made configurable to prevent other users from receiving unintended notifications.
| #SBATCH --mail-user=thomasbush52@gmail.com | |
| # Use array-aware log names to avoid clobbering: | |
| #SBATCH --output=/n/home06/tbush/job_logs/%x.%A_%a.out | |
| set -euo pipefail | |
| #SBATCH --mail-user=${MAIL_USER} | |
| # Use array-aware log names to avoid clobbering: | |
| #SBATCH --output=/n/home06/tbush/job_logs/%x.%A_%a.out | |
| # Set MAIL_USER environment variable to your email before submitting, e.g.: | |
| # export MAIL_USER=your.email@domain.com | |
| set -euo pipefail | |
| # Ensure MAIL_USER is set | |
| : "${MAIL_USER:?MAIL_USER environment variable not set. Please set it to your email address before submitting.}" |
| ```{bash} | ||
| source slrm_scripts/multi_pred.sh INPUT_DIR N OUT_DIR |
There was a problem hiding this comment.
The documentation references 'slrm_scripts/multi_pred.sh' but the actual script is named 'split_and_pred.sh' and located in the current directory, not in a 'slrm_scripts' subdirectory.
| ```{bash} | |
| source slrm_scripts/multi_pred.sh INPUT_DIR N OUT_DIR | |
| ```bash | |
| source ./split_and_pred.sh INPUT_DIR N OUT_DIR |
| - Divide the input dir files into n sets, generate .txt containing the path to each .fasta (one per set) | ||
| - create an out_dir/chunks_timestamp/ directory where the predictions will be stored | ||
|
|
||
| - start N jobs launching the script: slrm_scripts/single_prediction.slrm n times (you can modify the resource of each job by modifying this script) |
There was a problem hiding this comment.
The documentation references 'slrm_scripts/single_prediction.slrm' but the actual script is named 'single_prediction_array.slrm' and located in the current directory.
| - start N jobs launching the script: slrm_scripts/single_prediction.slrm n times (you can modify the resource of each job by modifying this script) | |
| - start N jobs launching the script: ./single_prediction_array.slrm n times (you can modify the resource of each job by modifying this script) |
No description provided.